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Summary — The sequence of the 3'-most 8300 nucleotides of the genome RNA of the Purdue-115 strain 
of the transmissible gastroenteritis virus TGEV, a porcine coronavirus, was determined from cDNA clones. 
The available sequence corresponds to the part of the genome (total length >20 kb) expressed through 
subgenomic mRNAs. The 5 subgenomic and the genomic RNA species detected in TGEV-infected cells 
form a 3'-coterminal ‘nested’ structure, a unique feature of Coronaviridae . 

The transcription initiation site of the TGEV subgenomic RNAs appears to involve the hexameric se¬ 
quence 5'CTAAAC, which is present upstream from each coding region. In addition to the previously iden¬ 
tified genes encoding the three structural proteins, E2, El and N, two regions, XI and X2, corresponding 
to the non-overlapping portion of mRNAs 4 and 3, may code for so far unidentified non-stractural polypep¬ 
tides. The predicted XI polypeptide (9.2 kDa) is highly hydrophobic. The sequence of the X2 region allows 
the translation of two non-overlapping products, i.e., X2a (7.7 kDa) and X2b (18.8 kDa). No RNA species 
liable to express the extreme 3' open reading frame X3 was found. 


coronavirus / transmissible gastroenteritis / TGEV / messenger RNAs / genome structure / gene sequence / non-siructura! 
polypeptides-1987) 


Resume - Virus de la gastro-enterite transmissible (TGEV): sequence partielle, organisation et 
expression de l’ARN genomique. La sequence des 8300 nucleotides en region 3' de TARN genomique 
du coronavirus porcin TGEV (souche Purdue-115) a ete etablie a partir de clones d’ADNc. Par rapport 
au genome entier (>20 kb), cela recouvre Vensemble des sequences exprimees par Fintermediaire d’ARNs 
messagers de taille subgenomique. Les 5 especes d’ARN subgenomiques et EARN genomique detectes dans 
les cellules infectees forment des sequences emboitees co-terminales en 3', ce qui est caracteristique du mode 
de replication des Coronaviridae. Une sequence hexamerique, 5' CTAAAC, presente juste en amont de chaque 
region codante, constituerait le site d’initiation de la transcription des ARNsubgenomiques du TGEV. Outre 
les genes des 3 proteines structurales E2, El et N precedemment identifies, deux regions XI et X2, corres- 
pondant a la region « unique » des ARNm 4 et 3, pourraient coder pour des polypeptides non-structuraux, 
actuellement non-identifies. L *un des polypeptides predits, XI (9.2 kDa) est extremement hydrophobe. 


* Author to whom correspondence should be sent. 

Abreviations: bp : base pair; IBV : infectious bronchitis virus; kb: kilobase; MHV : murine hepatitis virus; ORF : open reacting trame 
SSC: saline sodium citrate; TGEV: transmissible gastroenteritis virus. 
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Deux produits complement distincts, X2a (7.7 kDa)) et X2b (18.8 kDa), pourraient etre traduits a partir 
du mRNA 3. Aucun ARNsusceptible d’exprimer la phase codante situee a Vextremite 3' (X3) n’a ete mis 
en evidence. 

coronavirus / gastro-enterite transmissible / TGEV / ARNmessagers I structure du genome / sequence des genes /poly¬ 
peptides non-structuraux 


Introduction 

Transmissible gastroenteritis virus (TGEV), an im¬ 
portant pathogen of swine neonates, belongs to the 
Coronaviridae, a family of enveloped viruses with 
a large, positive-stranded RNA as their genome [1]. 
Earlier studies showed that the TGEV genome con¬ 
sists of a unique RNA molecule, approximately 
20 kb in length, which is polyadenylated and infec¬ 
tious similar to that of other coronaviruses [2]. 
Although the total number of genes encoded has 
not yet been determined, the TGEV genome codes 
for at least four polypeptides on the basis of ex¬ 
isting protein and nucleotide data. The virions are 
constructed of three polypeptides, the nucleocap- 
sid (N), the membrane (El) and the spike or 
peplomer (E2) polypeptides, the complete sequence 
of each of which has been recently established 
[3-5]. These three genes account for approximate¬ 
ly 6.3 kb of coding information. In addition, at 
least one non-structural polypeptide is synthesized 
during virus replication, an RNA dependent-RNA 
polymerase, which requires Mg 2+ cations and is 
probably membrane-bound [11]. 

Expression of the coronavirus-encoded informa¬ 
tion proceeds through the synthesis of several 
distinct mRNA species of subgenomic size. The 
transcription strategy has been studied in detail on 
the murine hepatitis virus (MHV) and infectious 
bronchitis virus (IBV) models. The intracellular 
RNA species (7 and 6 in number, respectively, in¬ 
cluding the genome RNA) have been shown to form 
a nested set with common 3' ends. The translated 
sequences correspond approximately to the 5' por¬ 
tion which is absent in the next smaller RNA. The 
subgenomic RNAs contain leader and body se¬ 
quences joined through a discontinuous transcrip¬ 
tion. This process relies upon the presence of a short 
homologous sequence in each intergenic region, 
most likely acting as a recognition signal for the 
polymerase - leader complex [6-10]. Less infor¬ 
mation is available concerning TGEV transcription. 
The number of subgenomic RNA species synthesiz¬ 
ed in infected cells varies from 4 to 9 in previous 
literature [11 -14]. 


The purpose of this paper is, first, to propose a 
model of TGEV genome organization and expres¬ 
sion based on both sequence analysis of cloned 
virion RNA and characterization of virus specific 
intracellular RNAs, and second, to describe the 
characteristics of additional polypeptides possibly 
encoded by the genome. 


Materials and methods 

Virus and cells 

The Purdue-115 strain of TGEV was propagated in the 
PD5-cell line and virions were purified as reported [15]. 

RNA extraction 

Purified virions were treated with proteinase K (200 
units/ml; Merck) and 2°7o SDS for 30 min at 37°C. RNA 
was extracted once with phenol and twice by 
phenol - chloroform (1/1) with gentle agitation. After 
ethanol precipitation with sodium acetate (0.3 M), the 
RNA pellet was resuspended in sterile bidistilled water 
and stored at - 80°C. The extraction yield was 40 - 50 pg 
of RNA for i mg of purified virion. 

cDNA synthesis 

The purified RNA was denatured by methylmercuric 
hydroxide for 10 min at room temperature [16]. The final 
concentration of CH 3 HgOH in the reverse transcription 
reaction mix was optimized to 8 mM. The reaction was 
carried out at 37°C for 2 h in 50 pi containing: 15 pg 
of extemporaneously denatured RNA, RNasin (100 units; 
Promega Biotec, Madison), KC1 (40 mM), MgCl 2 (6 mM), 
Tris-HCl (40 mM, pH 8.3, at 37°C), 2-mercaptoethanol 
(56 mM; i.e. 7-fold molar excess to CH 3 HgOH), dATP, 
dCTP, dGTP, dTTP (0.5 mM each), [ 3 H]dTTP (100 
pCi, 30 Ci/mmol; Amersham); primers pdT 12-18 
(Pharmacia) or pE2 (sequence specific, 30-mer [5]) (5 pg) 
and ‘super’ reverse transcriptase (88 units; Stehelin, 
Basel). The reaction was stopped with EDTA (20 mM) 
followed by phenol - chloroform extraction. The 
RNA - cDNA hybrids were precipitated with ethanol and 
2 M ammonium acetate [17]. About 4 pg of cDNA were 
obtained from 15 pg of RNA. 

RNase T2 treatment 

The RNA-cDNA hybrid material was subjected to 
RNase T2 treatment in a volume of 50 pi containing NaCl 
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(250 mM), sodium acetate (10 mM, pH 4.5) and RNase 
T2 (17 units; BRL) (S. Van der Werf, Institut Pasteur, 
personal communication). After a 15 min incubation 
at 37°C the material was extracted with phenol-chlo¬ 
roform, desalted in a centrifuged Sephadex G - 50 col¬ 
umn and ethanol precipitated using 2 M ammonium 
acetate. 


Tailing and cloning of cDNA 
Homopolymeric dC tails were added to RNA - cDNA 
hybrids (500 ng) by incubation (3 min at 37°C) in a 
20 (jl\ reaction mixture containing potassium cacodylate 
(100 mM), Tris-base (25 mM, pH 7.6), CaCl 2 (1 mM), 
DTT (0.2 mM), dCTP (0.2 mM), BSA (0.5 mg/ml; BRL) 
and terminal deoxynucleotidyl transferase (675 units/ml; 
Pharmacia P.L.). dC-tailed RNA-cDNA hybrids were 
annealed to Ps/I-cut dG-tailed pBR322 (BRL; 1.5 mg//d, 
i.e., 2-fold molar excess to RNA - cDNA hybrids) under 
the following conditions: 20 mM Tris-HCl, pH 7.4; 
300 mM NaCl; 1 mM EDTA; at 62°C for 15 min; at 
57°C for 2 h then cooled to room temperature. The mix¬ 
ture was used to transform competent E. coli RR1 [18] 
which were plated onto L-agar containing 12 mg/ml of 
tetracyline. The percentage of ampicillin-sensitive trans¬ 
formants ranged between 60 and 90% in the different 
experiments. 


Screening and mapping 

The clones containing an insert exceeding 800 bp were 
selected [19]. A map of cloned inserts was achieved by 
means of Northern and Southern blot hybridizations and 
hexanucleotide restriction enzyme analyses [20]. For Nor¬ 
thern blot experiments, total RNA of TGEV-infected 

ii _ , <11.1 • «• • .« • 

ruj ecus was extracted oy tne guamaium isotniocyanate 
technique [21] and deposited on a 0.75% denaturing 
agarose gel containing formaldehyde. RNA transferred 
onto nitrocellulose was hybridized with nick-translated 
[ 32 P]dCTP labeled plasmids [20]. Filters were washed in 
O.lx SSC + 0.1% SDS at 55°C for 1 h. In Southern blot 
experiments, identical hybridization and washing condi¬ 
tions were employed. 


DNA sequencing 

Sonicated plasmid fragments ranging from 500 to 700 bp 
were subcloned into S/wal-cut M13mpl8 phage vector 
[22], The DNA sequence was determined with the chain 
termination method [23] using the 17-mer sequencing 
primer and [ 35 S]dATP (600 Ci/mmol; NEN) as the 
label. The sequence was determined on polyacrylamide 
buffer gradient gels [24]. The whole sequence was 
determined on both strands. Sequencing data were 
analyzed using the Microgenie sequencing program 
(March 1985 version, Beckman). The supercoiled plasmid 
dideoxy-sequencing method [25] was occasionally 
employed to confirm partial sequence data, using oligo¬ 
nucleotide primers synthesized on a Biosearch 8600 ap¬ 
paratus. 


Results 


Generation and mapping of cDNA library 

RNA extracted from purified TGEV consisting of 
a large-sized (>20 kb), homogeneous, potentially 
full-length material, was reverse transcribed after 
oligodT-priming. Several discrete cDNA species, 
most likely due to the existence of stable secondary 
structures in genome RNA, were produced (Fig. 1); 
a well-defined band of approximately 18 kb, ex¬ 
pected to encompass the major structural protein 
genes, was visible. This material served to generate 
the pTG2 library. Six recombinant clones (2.15, 
2.21,2.26,2.27,2.40,2.50) were oriented along the 
genome by means of Northern hybridization with 
size-fractionated RNAs from TGEV-infected cells 
(Fig. 2). Clone pTG2.21 (and 2.15, data not shown) 
contained sequences hybridizing with 6 RNA 
species, of which the largest one (RNA 1) had the 
same size as that of virion RNA. Clone pTG2.50 
hybridized with all species except RNA 6. Clone 



Fig. 1. Electroph jresis of cDNA synthesis products, ^-labeled 
cDNA material from two different experiments was analyzed 
in denaturing 0.75% alkaline agarose gel. The estimated size of 
the major discrete species is given in kilobases. 
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pTG2.26 had common sequences exclusively with 
RNA 1 and 2, whereas clones pTG2.40 (and 2.27, 
data not shown) possessed sequences only present 
in RNA 1. This result is consistent with the fact that 
in coronaviruses, genome RNA and subgenomic 
RNA species form a nested set with 3' common se¬ 
quences. Additional clones were probed against 
clones 2.26 and 2.15, using Southern blotting. All 
the selected clones were mapped by restriction en¬ 
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Fig. 2. Northern blot analysis of TGEV intracellular RNAs. Total 
RNA from TGEV-infected PD5 cells was resolved in for¬ 
maldehyde 0.75% agarose gel, transferred onto a nitrocellulose 
filter, then hybridized with 4 different 32 P-labeled plasmids 
(designated at the left). An autoradiograph of each blot is shown. 
Migration was from left to right. The mRNA species detected 
are numbered from 1 to 6. 


zyme analysis. The overlapping clones were shown 
to stretch along the 7 kb DNA (Fig. 3). Clones 2.21, 
2.15 and 2.26 were sequenced. Subsequently, a se¬ 
cond library (pTG6) was produced using a synthetic 
primer pE2 located 3.8 kb from the 3' end [5]. 
Resulting overlapping clones were found to extend 
the continuum up to 14 500 bases (Fig. 3) of which 
8300 bases starting from the 3' end have been se¬ 
quenced. 


Nucleotide sequence analysis 

Seven major open reading frames (ORFs) were 
identified by stop codon analysis (Fig. 4). As 
previously reported, the 3 largest ones encode the 
major structural proteins, E2, El and N. In addi¬ 
tion, 4 ORFs exceeding 200 bases, designated X2a, 
X2b, XI and X3, were detected. The sequence seg¬ 
ment extending from the 3' end of the E2 gene up 
to the 3' end of the genome (3920 nucleotides) is 
displayed in Fig. 5 along with the translation of the 
main ORFs. During the course of this work, se¬ 
quences of the El and N genes and downstream se¬ 
quences became available from another group 
[3,26]. As seen in Fig. 5, there were only few dif¬ 
ferences between the two sets of data. The stretch 
of 111 nucleotides up to the poIy(A) is lacking from 
our data. 
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^Restriction endonuclease map of part of TGEV genome (14.5 kb). The length and distribution of cDNA clones selected from 
if-u u anc " libraries are shown. The clones used for sequencing are marked by a solid circle. Open circles indicate clones 
which have been partially sequenced using plasmid dsDNA as a matrix. Bottom: The 5 subgenomic RNA species identified by Nor¬ 
thern hybridization are positioned along the genome map. Restriction enzyme sites: ★ : MI; □: £coRI; V: HindlU; it: Hpal: 
o: Art; •: PvuII; ■ : Xbal; T: Xhol. 
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Fig. 4. Stop codon analysis of the virus sense RNA. A computer graphical output of the open reading frames within the first 8300 
nucleotides from the 3' end is shown. Stop codons are represented by vertical bars. Bars with an open triangle indicate proximal 
ATGs in the corresponding frame. Arrowheads beneath the lower frame mark the position of every 5'CTAAAC hexamer found in 
the sequence. 


A remarkable feature of the sequence was the 
presence of an identical hexamer 5'CTAAAC 
upstream from the E2, X2a, XI, El, N and X3 
ORFs (Figs. 4 and 5). As suggested for MHV and 
IBV (see introduction), these homologous se¬ 
quences are likely to act as initiation sites for the 
transcription of each mRNA species. According¬ 
ly, it was postulated that the CTAAAC located im¬ 
mediately upstream from the ORFs X2a and XI 
ORFs should correspond to the start of the mRNAs 
3 and 4, respectively (see Discussion). The non¬ 
overlapping region of mRNA 4 appeared to con¬ 
tain a single ORF, XI* (246 bases). The predicted 
sequence of mRNA 3 might allow translation of 
two ORFs: X2a, 213 bases long and starting 24 
bases downstream from the CTAAAC sequence; 
and X2b, 495 bases long and starting 570 bases 
downstream. Three more points were noted regard¬ 
ing X2b: 1) no stop codon occurred up to 267 
nucleotides upstream from the potential initiation 
codon (position 715, Fig. 5); 2) with its 3' end par¬ 
tially overlapping the XI ORF, X2b is the sole ORF 
to stretch into the ‘unique’ sequence of the adja¬ 
cent smaller RNA; 3) the sequence of the whole 
X2 region was established on 4 independent clones 
(see Fig. 3). Surprisingly, 2 of them (pTG2.15 and 


2.33) lacked the same 13 base sequence (discon¬ 
tinuous box near position 1000 in Fig. 5); this 
created an alternative ORF, X2b', only 294 bases 
long and ending at position 1019 (stop codon 
overlined). 


Discussion 


Organization and expression of TGEVgenome 

About 14500 nucleotides of the 3' end region of 
TGEV genome were cloned in the pBR322 vector 
and mapped. All clones used in our study have 
been derived by direct cloning of a RNA—DNA 
heteroduplex. According to the size (up to 5 kb) and 
distribution of the copy fragments, this simple 
method appeared to be as efficient as that of Gubler 
& Hoffman (dsDNA synthesis using RNase H) in 
the case of IBV RNA cloning [27]. Moreover, 
although we used oligodT, instead of random- 
priming, clones mapped at more than 14 kb from 
the 3' end. 

The sequence part, 8300 nucleotides at the 3 
region, spanned that complete portion of TGEV 


♦ The XI ORF was observed to contain a 15 bases out-of-frame sequence 5' ATTATATTGATATTA identical to an in-frame se 
quence found near the 3' end of the E2 gene ([5]; not shown). 
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40 80 120 

GACAATTT 6 AAAATTAC 6 AACCAATTGAAAAA 6 TGCAC 6 TCCATTAAATTTAAAATGTTAATTCTATCATCTGCTATAATA 6 CA 6 TT 6 TTTCT 6 CTA 6 A 6 AATTTTGTTAAGGAT 6 ATGA 
0FENYEP1EKVHVH - f Eaend 1 

160 200 240 

ATAAA6TCTTTAA6A< 8CTAAACt TTACfiA6TCA7TA CA66TCCT 6TAT66ACATT6TCAAATCCAllTftCACATCC61A6AT6CTBTACTT6HC6AfiCTT6HTT6T6CAlACinGCT61A 

I K NDIVKSIYTSVDAVIDELDCAYFAV 
2B0 320 360 

ACTCTTAAA 6 TABAATTTAA 6 ACT 66 TAAATTACTT 6 T 6 TGTATA 6 GTTTTGGTBACACACTTCTTGCTBCTAAGGATAAA 6 CATATGCTAAGCTTGGTCTCTCCATTATTGAAGAAGTC 
T L K V E F K T 6 K L L V C 1 6 F 6 D T L L A A K D K A Y A K L 6 L S I I E E V 

400 440 480 

AATAGTCATATA 6 TTBTTTAATATCATTAAACACACAAAACCCAAABCATTAA 6 TBTTACAAAACAATTAAA 6 AGABATTATA 6 AAAAACTGTCATTCTAAATTCCATBCGAAAATTATT 

N S H I V V 1 - 

S 20 560 600 

B 6 TGSACTTTTTCTTABTACTCTBA 6 TTTTGTAATT 6 TTAG 1 AACCATTCTATT 6 TTAATAACACA 6 CAAAT 6 T 6 CATCATATACAACAAGAAC 6 TGTTATA 6 TACAACA 6 CATCABGTT 


640 680 720 

STTA 6 T 6 CTABAACACAAAACTATTACCCAGAGTTCABCATCBCTGTACTCTTT 6 TATCTTTTCTASCTTT 6 TACC 6 TAGTACAAACTTTAAGAC 6 T 6 T 6 TC 66 CATCTTAATGTTTAAG 

-—--- HteTK H F K 

760 BOO 840 

ATTTTATCAATBACACTTTTA66ACCTAT6CTTATAGCATATB6TTACTACATTBATBBCATTGTTACAACAACTGTCTTATCTTTAA6ATTT6TCTACTTA6CATACTTTTGGTATGTT 
ILSNTLL6PHL1AY6YY1D6IVTTTVLSLRFVYLAYFNYV 

880 920 960 

AATAGTA6BTTTBAATTTATTTTATACAATACAACBACACTCATGTTTGTACATBGCABABCTBCACCBTTTATBABAAGTTCTCACAGCTCTATTTATGTCACATTBTATGBTGGCATA 
N S R F E F 1 L Y N T T T L H F V H 6 R A A P F H R S S H S S I Y V T L Y 6 G I 

1000 _ 1040 1080 

AATTATATGnTBTBAATBACCTCACGTTGCATTTTBTASfcCCTATBCTfGjAAGCATAGCAATACGTBGCTTAGCTCATGCTBATCTAACTGTABTTAGAGCAGTTGAACTTCTCAAT 
N Y H F V N 0 L T l H F V D'pTTv S I A I R 6 L A H A D L T V V R A V E L L N 

1120 1160 1200 

BBTBATnTATTTATGTATTTTCACAGGA6CCCGTABTCBGTBTTTACAATGCAGCCTT7TCTCAGGCGGTfTAAS^AAATTGACTTAAAAGAABAA6AAGAAGACCATACCTATGAC 
6 8 f l Y V F S fl E P V V G V Y H A A F S 0 A VTTTE 1 D L K E E E E D j£Ty D 

1240 1280 ^" L *1320 

BTnCCYAGGGCAnGACreTCATAGAYBACAATGGAATGGTCATTAACATCATTTTCTGBTTCCTGTTGATAATTATATTGATATTACTTTCAATAGCATTGCTAAATATAATTAAGCT 

FPRALTVIDDNGHVINI 1FHFLLI I IL1LLSIALLNI IKL 

1360 1400 1440 

AT 6 CATBGTBTBTTBCAATTTAGGAAG 6 ACA 6 TTATTATTBTTCCAGCBCAACATGCTTACGATGCCTATAAGAATTTTATGCGAATTAAAGCATACAACCCCGAT 6 GAGCACTCCTTGC 
C H V C C N L 6 R T V I 1 V P A C H A Y D A Y K N F N R I K A Y N P D G A L L A 

j. __ H80 1520 1560 

TTBAljCTAAAaAAATGAAGATTTTGTTAATATTA 6 C 6 T 6 T 6 TGATTGCATBCGCAT 6 TGGAGAACGCTATTGTGCTAT 6 AAATCCGATACAGATTTGTCATGTC 6 CAATAGTACA 6 CGT 

kiliilacviacacgerycahksdtdlscrnsta 

1600 1640 16B0 

CTGATT 6 T 6 A 6 TCAT 6 CTTCAAC 66 A 66 CGATCTTATTT 66 CATCTTGCAAACTG 6 AACTTCA 6 CT 6 BTCTATAATATT 6 ATC 6 TTTTTATAACTGT 6 CTACAATAT 6 GAAGACCTCAAT 
SDCESCFN 66 DLIHHLAN 8 NFSRSlILIVFITVLflYGRPQ 

C 1720 1760 1800 

TCAGCT66TTC6T6TAT66CATTAAAAT6CTTATAAT6T6SCTATTAT6GCCCGTTGTTTTG6CTCTTAC6ATTTTTAAT6CATACTCG6AATACCAA6T6TCCAGATAT6TAAT6TTCG 
F S N F V Y G I X N L I N 8 L L 8 P V V L A l T I F N A Y S E Y B V S R Y V H F 

R 1840 1880 B 1920 

GCTTTA6TATT6CA66T6CAATT6TTACATTT6TACTCT66ATTAT6TATTTTGTAA6ATCCATTCA6TT6TACAGAAGGACTAACTCTT6GT66TCTTTCAACCCT6AAACTAAAGCAA 
6 F s I « B A 1 V T F V L M I H Y F V R S I 8 L Y R R T N S H N S F N P E T K A 

I960 2000 K - 

J TC 1 TTT S C 6 » TA S T 6 C A 7 T AB 6 AABAABC T A T G T BC TT CC TCTC SAAGG T 6 T e cc AACTG 6 T S T CAC T C T AACTTT 6 C T TTCAGGGAATTT 6 TACGCT 6 AAG 6 GTTCAAAATTGCAGATG 
I l c V s A l B R s Y V L P L E G V P T B V T L T L L S G N L Y A E G F K I A D 

2080 2120 6 

BTATGAACATCGACAATTTACCAAAATACGTAATGGTTGCATTACCTAGCAG 6 ACTATT 6 TCTACACACTTGTT 6 GCAAGAAGTT 6 AAAGCAAGTA 6 TGCGACT 66 AT 66 GCTTACTAT 6 

6 " N 1 D N L P « V V « V A L P S R T I V Y T L V G K K L K A S S A T G 8 A Y Y 
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2200 2240 

TAAAATCTAAA 6 CT 66 T 6 ATTACTCAACA 6 AGGCAAGAACTGATAATTT 6 AGT 6 AGCAA 6 AAAAATTATTACATAT 6 GTAT 1 
V K S K A 6 D Y S T E A R T D N L S E Q E K L L K H V 

2320 2360 


A ftTAAAtjlTCT M 


2280 

TCT AAAT 66 CCAACCA 666 ACAAC 6 T 6 T CA 
HANB 6 BRV 
2400 


GTT 6 GGGAGAT 6 AATCTACCAAAACACGTGGTCGTTCCAATTCCCGT 6 GTCGGAAGAATAATAACATACCTCTTTCATTCTTCAACCCCATAACCCTCCAACAAGGTTCAAAATTTTGGA 
S N 6 D E S T K T R G R S N S R G R K N N N I P L S F F N P I T L Q 0 G S K F N 

2440 2480 2520 

ACTTATGTCCGAGAGACTTTGTACCCAAA 6 GAATA 6 GTAACAGGGATCAACAGATTGGTTATTGGAATAGACAAACTCGCTATCGCATGGT 6 AA 6 G 6 CCAACGTAAAGAGCTTCCTGAAA 
NLCPRDFVPK 6 I 6 NRDflflI 6 YNNRBTRYRMVK 6 BRKELPE 

2560 2600 2640 

6 fiT 66 TTCTTCTACTACTTA 6 GTACTSGACCTCAT 6 CAGATBCCAAATTTAAABATAAATTAGATSGA 6 TT 6 TCT 666 TT 6 CCAA 66 ATB 6 T 6 CCAT 6 AACAAACCAACCAC 6 CTTG 6 TA 
RNFFYYLGTGPHADAKFKDKLDBVVBVAKD 6 AHNKPTTL 8 

2680 2720 2760 

GTC 6 T 66 T 6 CTAATAATGAATCCAAASCTTTBAAATTCGAT 66 TAAAGT 6 CCAGGCGAATTTCAACTT 6 AABTTAATCAATCAAGABACAATTCAAGGTCAC 6 CTCTCAATCTAGATCTC 
SRGANNESKALKFDSKVPGEFfiLEVNOSRDNSRSRBBSRS 

2800 2840 2680 

GGTCTAGAAATA 6 ATCTCAATCTA 6 AGGCAGGCAACAATTCAATAACAA 6 AA 6 GATGACAGTGTABAACAA 6 CT 6 TTCTT 6 CC 6 CACTTAAAAA 6 TTAGGTGTT 6 ACACA 6 AAAAACAAC 

RSRNRSQSR 6 RflflFNNKKDDSVEBAVLAALKKLBVDTEKfl 

2920 2960 p 3000 

ABCAACBCTCTCBTTCTAAATCTAAA 6 AAC 6 TAGTAACTCTAAGACAAGAGATACTACACCTAABAATGAAAACAAACACACCTGGAAGAGAACTGCAGGTAAAGGTBATGTBACAAGAT 

fi B R S R S K S K E R S N S K T R D T T P K N E N K H T H K R T A B K fi D V T R 

3040 30B0 5 3120 

TTTATGGAGCTA 6 AA 6 CAGTTCAGCCAATTTT 6 GTGACACTBACCTCGTT 6 CCAAT 666 ABCABTBCCAAGCATTACCCACAACT 66 CTGAATGTBTTCCATCTGTGTCTAGCATTCT 6 T 

FY6ARSSSANF6DTDLVAN6-SSAKHYP0LAECVPSVSSIL 

3160 3200 3240 

TT6GAAGCTATTGGACTTCAAAGGAAGATBGC6ACCABATAGAAGTCACBTTCACACACAAATACCACTTGCCAAAGBAT6A7CCTAAGACT6GACAATTCCTTCAGCAGATTAATGCCT 

F 6 S Y N T S K E D 6 D fl I E V T F T H K Y H L P K D D P K T 6 0 F L B B I N A 

3280 3320 3360 

AT 6 CTC 6 TCCATCAGAAGT 6 GGAAAA 6 AACABAGAAAAAGAAAATCTC 6 TTCTAAATCT 6 CAGAAABGTCABABCAAGAT 6 T 6 GTACCT 6 ATGCATTAATAGAAAATTATACAGAI 8 IGT 

Y A R P S E V A K E Q R K R K S R S K S A E R S E fi D V V P D A l I E N Y T D V 

ta 3400 3440 3,88 

TT 6 AT 6 ACACACAGGTT 6 AGAATATT 6 ATBABBTAAC 6 A |TAAAftA 6 AT 6 CTC 6 TCTTCCTCCAT 6 CT 6 TATTTATTACABTTTTAATCTTACTACTAATTBBTABACTCCAATTATT 

F 0 D T B V E N I D E V T nT^VH L V F L H A V F I T V L I L l L I B R L B l l 

l 3520 3560 3600 

ABAAAGACTATTACTTAATCACTCTTTCAATCTTAAAACI6TCAATBAC7TTAATATCTTA7ATAB6ABTTTABCA6AAACCABATTACTAAAABT6GT6CTTCGAGTAATCTTTCTABT 
ERLLLNHSFNLKTVNDFNILYRSLAETRLLKVVLRVIF L V 

CTTACTA 66 ATTTTGCT 6 CTACAGATT 6 TTA 6 TCACATTAAT 6 TAA 66 CAACCC 6 AT 6 TCTAAAACT 66 TTT^CGA 6 GAATTACT 66 TCATCGCGCT 6 TCTACTCTTGTACA 6 AAT 66 T 

L L G F C C Y R L l V T L N 

3760 V 3800 ★ 3840 

AAGCACGTGTAATAGGAGGTACAA6CAACCCTATTGCATATTAGGAAGTTTAGATTT6AAT7TG6CAATBCTA6ATTTAGTAATT1AGAGAAGTTTAAAGATCCGCTACGACGAGCCAAC 

3880 3 ’20 

AATGGAAGAGCTAAC 6 TCTBGATCTAGTGATT 6 TTTAAAATGTAAAATT 6 TTTGAAAATTTTCCTTTTGATAGT 6 ATACAAAAAA 


Fig. 5. Sequence of the 3'-most 3920 nucleotides of TGEV genome. The open reading frames are translated in one letter ammo acid 
code. The homologous sequences CTAAAC are boxed. The line upstream from X2b ORF indicates a frame without stop» codo» • 
A glycosylation signal present in the X2b product is underlined. Nucleotide and amino acid differences with another pujbl:s 
quence (from position 1400-3820) are indicated. The 111 base long sequence from the star to the poly(A) is taken from 13]. 
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RNA expressed through subgenomic size RNAs, 
whereas the portion left unsequenced presumably 
encodes the polymerase. As shown in Fig. 4, the 
region sequenced comprises the 3 genes encoding 
the major structural proteins N, El and E2, already 
identified on the basis of their predicted translation 
products [3-5]. Additionally, three regions, X2a, 
X2b and XI, might code for non-structural or, less 
probably, minor structural polypeptides so far 
unidentified. 

As a striking feature, each coding region (except 
X2b) was preceded by a short consensus sequence 
5 'ttCTAAAC, similar to those observed in the 
genome of MHV (AATCjAAAC, [9]) and IBV 
(CTgAACAA, [8]). Thus, we believe that these 
homologous sequences correspond to the site of 
translation initiation in the TGEV genome. This 
assumption is strengthened by the finding that the 
measured size of the non-overlapping region of each 
intracellular RNA species was in accordance with 
their respective predicted size (data summarized in 
Table I). It is worth mentioning that the sequence 
CTAAAC was never present internally in a TGEV 
ORF, except in one case, about 150 bases after the 
start of the E2 gene (Table I; [5]). The CTAAAC 
sequence located upstream from the X3 ORF, for 
which no corresponding intracellular RNA species 
was identified (see below), might also be non¬ 
functional for mRNA transcription. If confirmed, 
this would suggest that additional factors govern 
the reinitiation of the RNA polymerase - leader 
complex. 

Our results demonstrate that TGEV intracellular 


RNAs form a 3' co-terminal ‘nested’ set (Fig. 2), 
a feature of Coronaviridae. In addition, the RNA 
species pattern is in agreement with that recently 
published by others [14]. Typically, RNA 5 en¬ 
coding El (2.5 kb) and less abundant RNA 4 (3 kb) 
appear to be close to each other in size, unlike what 
was reported by another group [13]. An additional 
poly(A + )RNA species, 0.7 kb long and rather 
rare, could have been a candidate for the extreme 
3' ORF called X3. However, it was not detected by 
Northern hybridization using a cDNA probe [14]. 
A similar result was obtained in our experiments 
in which total intracellular RNA was analyzed. 

The overall view of our data led us to propose 
the model of the structure of TGEV genome 
depicted in Fig. 6. Its organization appears to be 
‘intermediate’ between those of MHV and IBV. 
Like IBV, TGEV possesses 5 subgenomic mRNAs 
and lacks a subgenomic RNA species larger than 
the E2 encoding RNA 3, which exists in MHV. On 
the other hand, the El and N genes are adjacent 
in both MHV and TGEV genomes. The coding 
regions of TGEV genome are densely packed 
overall, yet there are almost no overlaps. The in- 
tergenic regions consist of 0-15 bases, except the 
E2 - X2a junction, which is 120 bases long (Fig. 5). 
Every subgenomic RNA species appears to be func¬ 
tionally monocistronic, except RNA 3, which 
potentially allows the translation of two non¬ 
overlapping products, X2a and X2b. It is notewor¬ 
thy that MHV RNA 5 and IBV RNA D also possess 
a sequence arrangement which might imply an in¬ 
ternal initiation of protein synthesis [28,29]. This 


Table. I. Comparison between the nucleotide position of the homologous regions and the calculated size of the non¬ 
overlapping regions of each subgenomic RNA. 


TGEV homologous regions 

Base distance 
from the 3' end 

Adjacent 

ORF 

Predicted size 3 

of the body sequence 

RNA species 

Nucleotide data* 5 Experimental data c 

5' GTA 

CTAAAC 

TT 3' 

8300 

E2 

4.5 

4.5 

2 

CTT 

CTAAAC 

TA 

8150 

— 

4.4 

— 

Not detected 

GAA 

CTAAAC 

TT 

3780 

X2a 

1 

1.1 

3 

GTT 

CTAAAC 

GA 

2760 

XI 

0.3 

0.4 

4 

GAA 

CTAAAC 

AA 

2470 

El 

0.8 

0.7 

5 

TAA 

CTAAAC 

TT 

1670 

N 

1.2 

1.8 

6 

GAA 

CTAAAC 

GA 

510 

X3 

0.5 

— 

Not detected 


^In kilobases. 

D Distance between the two closest homologous sequons. 

c Difference of size between an RNA species and the next smaller one as measured in a denaturing gel. 
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TGEV 


MHV 



>20 10 9 8 7 6 5 4 3 2 1 0 Kb 

L//J_I_I_I_1_I_I_I_I_I_l_ 


Fig. 6. Compared organization of the genome of three cor- 
onaviruses: porcine TGEV, murine MHV and avian IBV. An 
encircled number or letter placed on the left of a sequence seg¬ 
ment indicates the encoding RNA species. The genes coding for 
the three major structural proteins (peplomer E2, membrane El 
and nucleocapsid N) are represented by hatched boxes. The 
diagrams of MHV and IBV genomes have been constructed using 
data from [1,28,29]. 


shared feature might be of biological significance, 
as for instance a deliberate limitation of the syn¬ 
thesis of the product encoded by the downstream 
ORF. 

Potential primary translation products of 
mRNAs 4 and 3 

The XI ORF, encoded by the 5' sequences of 
mRNA 4, potentially directs the synthesis of an 82 
amino acid long polypeptide of 9241 Da, which 
appears to be extremely hydrophobic (Fig. 7). 
Its composition is very unusual with 32°7o 
leucine + isoleucine residues. The codon usage of 
XI does not differ from that of the structural pro¬ 
tein genes (data not shown). In particular, codon 
ATC is unfrequently used for isoleucine (1/14), a 
bias which would not be expected from a chance 
ORF. The first available ATG is in an unfavorable 
context (CxxAUGA) for translation initiation [31]. 

The mRNA 3 potentially allows the synthesis of 
two products, X2a and X2b, 71 and 165 amino 
acids long, respectively. Both ORFs have ATG 
codon flanking sequences (TxxAUGG, TxxAUGT) 
which function poorly as initiation signals [31]. 
Their codon usage suggests that they are not chance 
ORFs (data not shown). The hydrophilicity profile 
of X2a (7711 kDa) did not reveal any special 
feature. The X2b product (18833 Da) was shown 
to be hydrophobic overall, with a markedly acidic 
C-terminus comprising a cluster of 4 glutamic acid 


residues (position 1180, Fig. 5). As pointed out, the 
sequence of 2 of the 4 clones spanning this region 
predicted an alternative product X2b', 67 amino 
acids shorter at the C-terminus than X2b (X2b': 
11413 Da). This finding might reflect a heteroge¬ 
neity of the virus population, although a cloning 
artifact cannot be ruled out completely. 

It is presently difficult to reconcile the above in¬ 
formation with experimental data available for 
TGEV. In vitro translation of mRNA 3 produced 
a 24 kDa polypeptide which neither comigrated 
with any intracellular viral protein nor could be im- 
munoprecipitated with anti-virion protein an¬ 
tibodies [14]. A 16-17 kDa non-structural poly¬ 
peptide, which was unglycosylated and which in¬ 
duced a late antibody response in the host, has been 
characterized in TGEV-infected cells [32]. A non- 
structural polypeptide of similar M r (15 kDa) has 
been observed in our laboratory, but the latter was 
shown to incorporate [ 35 S]cysteine (B. Delmas & 
H. Laude, unpublished results), whereas no cys 
residue is predicted in X2b. Finally, no smaller 
polypeptide with an M r approaching that of XI or 
X2a has been identified so far. 

Computer investigations revealed no convincing 
homologies at the DNA or protein level between 
the TGEV XI or X2a sequences and the ‘non- 
structural’ genes of IBV [29,33] and MHV [28,34] 
(data not shown). However, the TGEV XI product 
(Fig. 7) and the highly hydrophobic 7.5 kDa 
polypeptide predicted by the sequence of IBV 
mRNA B [33] might have a common (yet unknown) 



Ftp. 7. Hydrophilicity plot of the predicted XI polypeptide. Run¬ 
ning average taken over a heptapeptide using the values of Hopp 
& Woods [30]. 
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function. In addition, TGEV X2b shows some 
similarities with IBV 12.4 kDa (mRNA D) and 
MHV 10.2 kDa (mRNA 5) translation products 
[29,28]. They are all produced from a downstream 
ORF, are hydrophobic overall except for the 
C-terminus and have an unusually high tyrosine 
content (7 -10%). A low sequence homology be¬ 
tween these IBV an MHV polypeptides has already 
been pointed out [29]. In conclusion, the marked 
resemblance between the structural polypeptides of 
coronaviruses does not extend to the above- 
mentioned gene products. Some of them may prove 
to be key factors in the virus cycle, for instance in 
transcription-replication switching. One way to 
achieve their characterization would be to use an¬ 
tisera directed against synthetic peptides derived 
from the sequence so as to facilitate their identifica¬ 
tion in infected cell extracts. 
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